26 research outputs found

    Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation

    Get PDF
    One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model.One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model

    Classification of mexican paper currency denomination by extracting their discriminative colors

    Get PDF
    In this paper we describe a machine vision approach to recognize the denomination classes of the Mexican paper currency by extracting their color features. A banknote’s color is characterized by summing all the color vectors of the image’s pixels to obtain a resultant vector, the banknote’s denomination is classified by knowing the orientation of the resulting vector within the RGB space. In order to obtain a more precise characterization of paper currency, the less discriminative colors of each denomination are eliminated from the images; the color selection is applied in the RGB and HSV spaces, separately. Experimental results with the current Mexican banknotes are presented.Proyecto PROMEP 103.5/13/653

    Segmentation of images by color features: a survey

    Get PDF
    En este articulo se hace la revisión del estado del arte sobre la segmentación de imagenes de colorImage segmentation is an important stage for object recognition. Many methods have been proposed in the last few years for grayscale and color images. In this paper, we present a deep review of the state of the art on color image segmentation methods; through this paper, we explain the techniques based on edge detection, thresholding, histogram-thresholding, region, feature clustering and neural networks. Because color spaces play a key role in the methods reviewed, we also explain in detail the most commonly color spaces to represent and process colors. In addition, we present some important applications that use the methods of image segmentation reviewed. Finally, a set of metrics frequently used to evaluate quantitatively the segmented images is shown

    Data selection based on decision tree for SVM classification on large data sets

    Get PDF
    Support Vector Machine (SVM) has important properties such as a strong mathematical background and a better generalization capability with respect to other classification methods. On the other hand, the major drawback of SVM occurs in its training phase, which is computationally expensive and highly dependent on the size of input data set. In this study, a new algorithm to speed up the training time of SVM is presented; this method selects a small and representative amount of data from data sets to improve training time of SVM. The novel method uses an induction tree to reduce the training data set for SVM, producing a very fast and high-accuracy algorithm. According to the results, the proposed algorithm produces results with similar accuracy and in a faster way than the current SVM implementations.Proyecto UAEM 3771/2014/C

    Complex identification of plants from leaves

    Get PDF
    Se presenta una propuesta para el reconocimiento de hojas muy semejantes en su apariencia.The automatic identification of plant leaves is a very important current topic of research in vision systems. Several researchers have tried to solve the problem of identification from plant leaves proposing various techniques. The proposed techniques in the literature have obtained excellent results on data sets where the leaves have dissimilar features to each other. However, in cases where the leaves are very similar to each other, the classification accuracy falls significantly. In this paper, we proposed a system to deal with the performance problem of machine learning algorithms where the leaves are very similar. The results obtained show that combination of different features and features selection process can improve the classification accuracy

    PSO-based method for svm classification on skewed data-sets

    Get PDF
    Support Vector Machines (SVM) have shown excellent generalization power in classification problems. However, on skewed data-sets, SVM learns a biased model that affects the classifier performance, which is severely damaged when the unbalanced ratio is very large. In this paper, a new external balancing method for applying SVM on skewed data sets is developed. In the first phase of the method, the separating hyperplane is computed. Support vectors are then used to generate the initial population of PSO algorithm, which is used to improve the population of artificial instances and to eliminate noise instances. Experimental results demonstrate the ability of the proposed method to improve the performance of SVM on imbalanced data-sets.Proyecto UAEM 3771/2014/CI

    Towards Association Rule-based Item Selection Strategy in Computerized Adaptive Testing

    Get PDF
    One of the most important stages of Computerized Adaptive Testing is the selection of items, in which various methods are used, which have certain weaknesses at the time of implementation. Therefore, in this paper, it is proposed the integration of Association Rule Mining as an item selection criterion in a CAT system. We present the analysis of association rule mining algorithms such as Apriori, FP-Growth, PredictiveApriori and Tertius into two data set with the purpose of knowing the advantages and disadvantages of each algorithm and choose the most suitable. We compare the algorithms considering number of rules discovered, average support and confidence, and velocity. According to the experiments, Apriori found rules with greater confidence, support, in less time.Una de las etapas más importantes de las pruebas adaptativas informatizadas es la selección de ítems, en la cual se utilizan diversos métodos que presentan ciertas debilidades al momento de su aplicación. Así, en este trabajo, se propone la integración de la minería de reglas de asociación como criterio de selección de ítems en un sistema CAT. Se presenta el análisis de algoritmos de minería de reglas de asociación como Apriori, FP-Growth, PredictiveApriori y Tertius en dos conjuntos de datos con el fin de conocer las ventajas y desventajas de cada algoritmo y elegir el más adecuado. Se compararon los algoritmos teniendo en cuenta el número de reglas descubiertas, el soporte y confianza promedios y la velocidad. Según los experimentos, Apriori encontró reglas con mayor confianza y soporte en un menor tiempo

    Un método de fragmentación híbrida para bases de datos multimedia

    Get PDF
    La fragmentación híbrida es una técnica reconocida para lograr la optimización de consultas tanto en bases de datos relacionales como en bases de datos orientadas a objetos. Debido a la creciente disponibilidad de aplicaciones multimedia, surgió el interés de utilizar técnicas de fragmentación en bases de datos multimedia para tomar ventaja de la reducción en el número de páginas requeridas para responder una consulta, así como de la minimización del intercambio de datos entre sitios. Sin embargo, hasta ahora sólo se ha utilizado fragmentación vertical y horizontal en estas bases de datos. Este artículo presenta un método de fragmentación híbrida para bases de datos multimedia. Este método toma en cuenta el tamaño de los atributos y la selectividad de los predicados para generar esquemas de fragmentación híbridos que reducen el costo de ejecución de las consultas. También, se desarrolla un modelo de costo para evaluar esquemas de fragmentación híbridos en bases de datos multimedia. Finalmente, se presentan algunos experimentos en una base de datos de prueba con el fin de demostrar la eficiencia del método de fragmentación propuesto.Hybrid partitioning has been recognized as a technique to achieve query optimization in relational and object-oriented databases. Due to the increasing availability of multimedia applications, there is an interest in using partitioning techniques in multimedia databases in order to take advantage of the reduction in the number of pages required to answer a query and to minimize data exchange among sites. Nevertheless, until now only vertical and horizontal partitioning have been used in multimedia databases. This paper presents a hybrid partitioning method for multimedia databases. This method takes into account the size of the attributes and the selectivity of the predicates in order to generate hybrid partitioning schemes that reduce the execution cost of the queries. A cost model for evaluating hybrid partitioning schemes in distributed multimedia databases was developed. Experiments in a multimedia database benchmark were performed in order to demonstrate the efficiency of our approach

    Analysis of medical opinions about the nonrealization of autopsies in a Mexican hospital using association rules and bayesian networks

    Get PDF
    This research identifies the factors influencing the reduction of autopsies in a hospital of Veracruz. The study is based on the application of data mining techniques such as association rules and Bayesian networks in data sets obtained from opinions of physicians. We analyzed, for the exploration and extraction of the knowledge, algorithms like Apriori, FPGrowth, PredictiveApriori, Tertius, J48, NaiveBayes, MultilayerPerceptron, and BayesNet, all of them provided by the API of WEKA. To generate mining models and present the new knowledge in natural language, we also developed a web application. The results presented in this study are those obtained from the best-evaluated algorithms, which have been validated by specialists in the field of patholog

    Software para el aprendizaje de la lectoescritura basado en control gestual de manos y realidad aumentada

    Get PDF
    Con los avances tecnológicos, distintos proyectos desarrollaron soluciones satisfactorias a problemáticas de aprendizaje principalmente relacionadas con el área de las matemáticas y la medicina utilizando realidad aumentada e interfaces humano-máquina, siendo un área de oportunidad la aplicación de estas tecnologías en el aprendizaje de la lectoescritura. Dado que según la última prueba PISA, aplicada en 2018, México se sitúa en el nivel 2 de comprensión lectora, por debajo del promedio de los países miembros de la Organización para la Cooperación y el Desarrollo Económico (OCDE), en respuesta a esta problemática, se desarrolló una herramienta que combina la realidad aumentada e interfaces humano-máquina aplicado a la lectoescritura. Esta aplicación despliega modelos tridimensionales renderizados a través de un dispositivo con cámara, permitiendo a los usuarios interactuar con estos modelos mediante el reconocimiento de movimientos de las manos. La aplicación fue probada exitosamente en estudiantes de jardín de niños divididos en dos grupos: rezagados y adelantados en el aprendizaje de las letras. Los resultados demostraron que esta herramienta es entretenida y efectiva, proporcionando un recurso significativo para profesores y maestros en la enseñanza de la lectoescritura
    corecore